GenBlastA: enabling BLAST to identify homologous gene sequences.
نویسندگان
چکیده
BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.
منابع مشابه
Molecular Detection of Novel Genetic Variants Associated to Anaplasma ovis among Dromedary Camels in Iran
To the best of our knowledge, little information is available regarding the presence of Anaplasma species in camels in Iran. This study sought to investigate the presence of Anaplasma species by microscopy and polymerase chain reaction (PCR) assays in 100 healthy dromedaries (Camelus dromedarius) arriving for slaughter. The microscopic examination of Giemsa-stained blood films revealed that Ana...
متن کاملSequencing and phylogenetic study of APETALA1 homologous gene in garden cress (Lepidium sativum L.)
The flowering process in plants proceeds through the induction of an inflorescence meristem triggered by several pathways. Many of the genes associated with these pathways encode transcription factors of the MADS domain family. The MADS-domain transcription factor APETALA1 (AP1) is a key regulator of flower development. The first step to understand the molecular mechanisms under the function of...
متن کاملMolecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملDatabase Searching and BLAST Tuesday , October 27 th
The goal of a database search is to find all “high-scoring” local alignments (i.e, local alignments with a score above a given threshold) and to determine the significance of alignments found. A database search can be used to compare a protein or a cDNA sequence with genomic DNA, e.g. to find gene location or identify intron/exon boundaries. Another application is to find homologous protein seq...
متن کاملGenetic and Molecular Dissection of Blast Resistance in Rice Using RFLP, Simple Sequence Repeats and Defense-Related Candidate Gene Markers
Blast, Pyricularia grisea (Cooke) Sacc., is one of the most destructive diseases of rice worldwide and canresult in significant reductions in yield. The use of resistant cultivars is the most economical and effectiveway of controlling rice blast. A variety of DNA markers, including plant defense-related candidategene markers are available for genetic characterization and molec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 19 1 شماره
صفحات -
تاریخ انتشار 2009